Keyword Search Result

[Keyword] deep learning(167hit)

81-100hit(167hit)

  • Triplet Attention Network for Video-Based Person Re-Identification

    Rui SUN  Qili LIANG  Zi YANG  Zhenghui ZHAO  Xudong ZHANG  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2021/07/21
      Vol:
    E104-D No:10
      Page(s):
    1775-1779

    Video-based person re-identification (re-ID) aims at retrieving person across non-overlapping camera and has achieved promising results owing to deep convolutional neural network. Due to the dynamic properties of the video, the problems of background clutters and occlusion are more serious than image-based person Re-ID. In this letter, we present a novel triple attention network (TriANet) that simultaneously utilizes temporal, spatial, and channel context information by employing the self-attention mechanism to get robust and discriminative feature. Specifically, the network has two parts, where the first part introduces a residual attention subnetwork, which contains channel attention module to capture cross-dimension dependencies by using rotation and transformation and spatial attention module to focus on pedestrian feature. In the second part, a time attention module is designed to judge the quality score of each pedestrian, and to reduce the weight of the incomplete pedestrian image to alleviate the occlusion problem. We evaluate our proposed architecture on three datasets, iLIDS-VID, PRID2011 and MARS. Extensive comparative experimental results show that our proposed method achieves state-of-the-art results.

  • Conditional Wasserstein Generative Adversarial Networks for Rebalancing Iris Image Datasets

    Yung-Hui LI  Muhammad Saqlain ASLAM  Latifa Nabila HARFIYA  Ching-Chun CHANG  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2021/06/01
      Vol:
    E104-D No:9
      Page(s):
    1450-1458

    The recent development of deep learning-based generative models has sharply intensified the interest in data synthesis and its applications. Data synthesis takes on an added importance especially for some pattern recognition tasks in which some classes of data are rare and difficult to collect. In an iris dataset, for instance, the minority class samples include images of eyes with glasses, oversized or undersized pupils, misaligned iris locations, and iris occluded or contaminated by eyelids, eyelashes, or lighting reflections. Such class-imbalanced datasets often result in biased classification performance. Generative adversarial networks (GANs) are one of the most promising frameworks that learn to generate synthetic data through a two-player minimax game between a generator and a discriminator. In this paper, we utilized the state-of-the-art conditional Wasserstein generative adversarial network with gradient penalty (CWGAN-GP) for generating the minority class of iris images which saves huge amount of cost of human labors for rare data collection. With our model, the researcher can generate as many iris images of rare cases as they want and it helps to develop any deep learning algorithm whenever large size of dataset is needed.

  • Capsule Network with Shortcut Routing Open Access

    Thanh Vu DANG  Hoang Trong VO  Gwang Hyun YU  Jin Young KIM  

     
    PAPER-Image

      Pubricized:
    2021/01/27
      Vol:
    E104-A No:8
      Page(s):
    1043-1050

    Capsules are fundamental informative units that are introduced into capsule networks to manipulate the hierarchical presentation of patterns. The part-hole relationship of an entity is learned through capsule layers, using a routing-by-agreement mechanism that is approximated by a voting procedure. Nevertheless, existing routing methods are computationally inefficient. We address this issue by proposing a novel routing mechanism, namely “shortcut routing”, that directly learns to activate global capsules from local capsules. In our method, the number of operations in the routing procedure is reduced by omitting the capsules in intermediate layers, resulting in lighter routing. To further address the computational problem, we investigate an attention-based approach, and propose fuzzy coefficients, which have been found to be efficient than mixture coefficients from EM routing. Our method achieves on-par classification results on the Mnist (99.52%), smallnorb (93.91%), and affNist (89.02%) datasets. Compared to EM routing, our fuzzy-based and attention-based routing methods attain reductions of 1.42 and 2.5 in terms of the number of calculations.

  • Video Inpainting by Frame Alignment with Deformable Convolution

    Yusuke HARA  Xueting WANG  Toshihiko YAMASAKI  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2021/04/22
      Vol:
    E104-D No:8
      Page(s):
    1349-1358

    Video inpainting is a task of filling missing regions in videos. In this task, it is important to efficiently use information from other frames and generate plausible results with sufficient temporal consistency. In this paper, we present a video inpainting method jointly using affine transformation and deformable convolutions for frame alignment. The former is responsible for frame-scale rough alignment and the latter performs pixel-level fine alignment. Our model does not depend on 3D convolutions, which limits the temporal window, or troublesome flow estimation. The proposed method achieves improved object removal results and better PSNR and SSIM values compared with previous learning-based methods.

  • Hybrid Electrical/Optical Switch Architectures for Training Distributed Deep Learning in Large-Scale

    Thao-Nguyen TRUONG  Ryousei TAKANO  

     
    PAPER-Information Network

      Pubricized:
    2021/04/23
      Vol:
    E104-D No:8
      Page(s):
    1332-1339

    Data parallelism is the dominant method used to train deep learning (DL) models on High-Performance Computing systems such as large-scale GPU clusters. When training a DL model on a large number of nodes, inter-node communication becomes bottle-neck due to its relatively higher latency and lower link bandwidth (than intra-node communication). Although some communication techniques have been proposed to cope with this problem, all of these approaches target to deal with the large message size issue while diminishing the effect of the limitation of the inter-node network. In this study, we investigate the benefit of increasing inter-node link bandwidth by using hybrid switching systems, i.e., Electrical Packet Switching and Optical Circuit Switching. We found that the typical data-transfer of synchronous data-parallelism training is long-lived and rarely changed that can be speed-up with optical switching. Simulation results on the Simgrid simulator show that our approach speed-up the training time of deep learning applications, especially in a large-scale manner.

  • An Efficient Deep Learning Based Coarse-to-Fine Cephalometric Landmark Detection Method

    Yu SONG  Xu QIAO  Yutaro IWAMOTO  Yen-Wei CHEN  Yili CHEN  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2021/05/14
      Vol:
    E104-D No:8
      Page(s):
    1359-1366

    Accurate and automatic quantitative cephalometry analysis is of great importance in orthodontics. The fundamental step for cephalometry analysis is to annotate anatomic-interested landmarks on X-ray images. Computer-aided automatic method remains to be an open topic nowadays. In this paper, we propose an efficient deep learning-based coarse-to-fine approach to realize accurate landmark detection. In the coarse detection step, we train a deep learning-based deformable transformation model by using training samples. We register test images to the reference image (one training image) using the trained model to predict coarse landmarks' locations on test images. Thus, regions of interest (ROIs) which include landmarks can be located. In the fine detection step, we utilize trained deep convolutional neural networks (CNNs), to detect landmarks in ROI patches. For each landmark, there is one corresponding neural network, which directly does regression to the landmark's coordinates. The fine step can be considered as a refinement or fine-tuning step based on the coarse detection step. We validated the proposed method on public dataset from 2015 International Symposium on Biomedical Imaging (ISBI) grand challenge. Compared with the state-of-the-art method, we not only achieved the comparable detection accuracy (the mean radial error is about 1.0-1.6mm), but also largely shortened the computation time (4 seconds per image).

  • CJAM: Convolutional Neural Network Joint Attention Mechanism in Gait Recognition

    Pengtao JIA  Qi ZHAO  Boze LI  Jing ZHANG  

     
    PAPER

      Pubricized:
    2021/04/28
      Vol:
    E104-D No:8
      Page(s):
    1239-1249

    Gait recognition distinguishes one individual from others according to the natural patterns of human gaits. Gait recognition is a challenging signal processing technology for biometric identification due to the ambiguity of contours and the complex feature extraction procedure. In this work, we proposed a new model - the convolutional neural network (CNN) joint attention mechanism (CJAM) - to classify the gait sequences and conduct person identification using the CASIA-A and CASIA-B gait datasets. The CNN model has the ability to extract gait features, and the attention mechanism continuously focuses on the most discriminative area to achieve person identification. We present a comprehensive transformation from gait image preprocessing to final identification. The results from 12 experiments show that the new attention model leads to a lower error rate than others. The CJAM model improved the 3D-CNN, CNN-LSTM (long short-term memory), and the simple CNN by 8.44%, 2.94% and 1.45%, respectively.

  • Secret Key Generation Scheme Based on Deep Learning in FDD MIMO Systems

    Zheng WAN  Kaizhi HUANG  Lu CHEN  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2021/04/07
      Vol:
    E104-D No:7
      Page(s):
    1058-1062

    In this paper, a deep learning-based secret key generation scheme is proposed for FDD multiple-input and multiple-output (MIMO) systems. We built an encoder-decoder based convolutional neural network to characterize the wireless environment to learn the mapping relationship between the uplink and downlink channel. The designed neural network can accurately predict the downlink channel state information based on the estimated uplink channel state information without any information feedback. Random secret keys can be generated from downlink channel responses predicted by the neural network. Simulation results show that deep learning based SKG scheme can achieve significant performance improvement in terms of the key agreement ratio and achievable secret key rate.

  • Multi-View Texture Learning for Face Super-Resolution

    Yu WANG  Tao LU  Feng YAO  Yuntao WU  Yanduo ZHANG  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/03/24
      Vol:
    E104-D No:7
      Page(s):
    1028-1038

    In recent years, single face image super-resolution (SR) using deep neural networks have been well developed. However, most of the face images captured by the camera in a real scene are from different views of the same person, and the existing traditional multi-frame image SR requires alignment between images. Due to multi-view face images contain texture information from different views, which can be used as effective prior information, how to use this prior information from multi-views to reconstruct frontal face images is challenging. In order to effectively solve the above problems, we propose a novel face SR network based on multi-view face images, which focus on obtaining more texture information from multi-view face images to help the reconstruction of frontal face images. And in this network, we also propose a texture attention mechanism to transfer high-precision texture compensation information to the frontal face image to obtain better visual effects. We conduct subjective and objective evaluations, and the experimental results show the great potential of using multi-view face images SR. The comparison with other state-of-the-art deep learning SR methods proves that the proposed method has excellent performance.

  • Attention Voting Network with Prior Distance Augmented Loss for 6DoF Pose Estimation

    Yong HE  Ji LI  Xuanhong ZHOU  Zewei CHEN  Xin LIU  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/03/26
      Vol:
    E104-D No:7
      Page(s):
    1039-1048

    6DoF pose estimation from a monocular RGB image is a challenging but fundamental task. The methods based on unit direction vector-field representation and Hough voting strategy achieved state-of-the-art performance. Nevertheless, they apply the smooth l1 loss to learn the two elements of the unit vector separately, resulting in which is not taken into account that the prior distance between the pixel and the keypoint. While the positioning error is significantly affected by the prior distance. In this work, we propose a Prior Distance Augmented Loss (PDAL) to exploit the prior distance for more accurate vector-field representation. Furthermore, we propose a lightweight channel-level attention module for adaptive feature fusion. Embedding this Adaptive Fusion Attention Module (AFAM) into the U-Net, we build an Attention Voting Network to further improve the performance of our method. We conduct extensive experiments to demonstrate the effectiveness and performance improvement of our methods on the LINEMOD, OCCLUSION and YCB-Video datasets. Our experiments show that the proposed methods bring significant performance gains and outperform state-of-the-art RGB-based methods without any post-refinement.

  • Preliminary Performance Analysis of Distributed DNN Training with Relaxed Synchronization

    Koichi SHIRAHATA  Amir HADERBACHE  Naoto FUKUMOTO  Kohta NAKASHIMA  

     
    BRIEF PAPER

      Pubricized:
    2020/12/01
      Vol:
    E104-C No:6
      Page(s):
    257-260

    Scalability of distributed DNN training can be limited by slowdown of specific processes due to unexpected hardware failures. We propose a dynamic process exclusion technique so that training throughput is maximized. Our evaluation using 32 processes with ResNet-50 shows that our proposed technique reduces slowdown by 12.5% to 50% without accuracy loss through excluding the slow processes.

  • Differentially Private Neural Networks with Bounded Activation Function

    Kijung JUNG  Hyukki LEE  Yon Dohn CHUNG  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2021/03/18
      Vol:
    E104-D No:6
      Page(s):
    905-908

    Deep learning has shown outstanding performance in various fields, and it is increasingly deployed in privacy-critical domains. If sensitive data in the deep learning model are exposed, it can cause serious privacy threats. To protect individual privacy, we propose a novel activation function and stochastic gradient descent for applying differential privacy to deep learning. Through experiments, we show that the proposed method can effectively protect the privacy and the performance of proposed method is better than the previous approaches.

  • MTGAN: Extending Test Case set for Deep Learning Image Classifier

    Erhu LIU  Song HUANG  Cheng ZONG  Changyou ZHENG  Yongming YAO  Jing ZHU  Shiqi TANG  Yanqiu WANG  

     
    PAPER-Software Engineering

      Pubricized:
    2021/02/05
      Vol:
    E104-D No:5
      Page(s):
    709-722

    During the recent several years, deep learning has achieved excellent results in image recognition, voice processing, and other research areas, which has set off a new upsurge of research and application. Internal defects and external malicious attacks may threaten the safe and reliable operation of a deep learning system and even cause unbearable consequences. The technology of testing deep learning systems is still in its infancy. Traditional software testing technology is not applicable to test deep learning systems. In addition, the characteristics of deep learning such as complex application scenarios, the high dimensionality of input data, and poor interpretability of operation logic bring new challenges to the testing work. This paper focuses on the problem of test case generation and points out that adversarial examples can be used as test cases. Then the paper proposes MTGAN which is a framework to generate test cases for deep learning image classifiers based on Generative Adversarial Network. Finally, this paper evaluates the effectiveness of MTGAN.

  • HAIF: A Hierarchical Attention-Based Model of Filtering Invalid Webpage

    Chaoran ZHOU  Jianping ZHAO  Tai MA  Xin ZHOU  

     
    PAPER

      Pubricized:
    2021/02/25
      Vol:
    E104-D No:5
      Page(s):
    659-668

    In Internet applications, when users search for information, the search engines invariably return some invalid webpages that do not contain valid information. These invalid webpages interfere with the users' access to useful information, affect the efficiency of users' information query and occupy Internet resources. Accurate and fast filtering of invalid webpages can purify the Internet environment and provide convenience for netizens. This paper proposes an invalid webpage filtering model (HAIF) based on deep learning and hierarchical attention mechanism. HAIF improves the semantic and sequence information representation of webpage text by concatenating lexical-level embeddings and paragraph-level embeddings. HAIF introduces hierarchical attention mechanism to optimize the extraction of text sequence features and webpage tag features. Among them, the local-level attention layer optimizes the local information in the plain text. By concatenating the input embeddings and the feature matrix after local-level attention calculation, it enriches the representation of information. The tag-level attention layer introduces webpage structural feature information on the attention calculation of different HTML tags, so that HAIF is better applicable to the Internet resource field. In order to evaluate the effectiveness of HAIF in filtering invalid pages, we conducted various experiments. Experimental results demonstrate that, compared with other baseline models, HAIF has improved to various degrees on various evaluation criteria.

  • Action Recognition Using Pose Data in a Distributed Environment over the Edge and Cloud

    Chikako TAKASAKI  Atsuko TAKEFUSA  Hidemoto NAKADA  Masato OGUCHI  

     
    PAPER

      Pubricized:
    2021/02/02
      Vol:
    E104-D No:5
      Page(s):
    539-550

    With the development of cameras and sensors and the spread of cloud computing, life logs can be easily acquired and stored in general households for the various services that utilize the logs. However, it is difficult to analyze moving images that are acquired by home sensors in real time using machine learning because the data size is too large and the computational complexity is too high. Moreover, collecting and accumulating in the cloud moving images that are captured at home and can be used to identify individuals may invade the privacy of application users. We propose a method of distributed processing over the edge and cloud that addresses the processing latency and the privacy concerns. On the edge (sensor) side, we extract feature vectors of human key points from moving images using OpenPose, which is a pose estimation library. On the cloud side, we recognize actions by machine learning using only the feature vectors. In this study, we compare the action recognition accuracies of multiple machine learning methods. In addition, we measure the analysis processing time at the sensor and the cloud to investigate the feasibility of recognizing actions in real time. Then, we evaluate the proposed system by comparing it with the 3D ResNet model in recognition experiments. The experimental results demonstrate that the action recognition accuracy is the highest when using LSTM and that the introduction of dropout in action recognition using 100 categories alleviates overfitting because the models can learn more generic human actions by increasing the variety of actions. In addition, it is demonstrated that preprocessing using OpenPose on the sensor side can substantially reduce the transfer quantity from the sensor to the cloud.

  • Deep Network for Parametric Bilinear Generalized Approximate Message Passing and Its Application in Compressive Sensing under Matrix Uncertainty

    Jingjing SI  Wenwen SUN  Chuang LI  Yinbo CHENG  

     
    LETTER-Digital Signal Processing

      Pubricized:
    2020/09/29
      Vol:
    E104-A No:4
      Page(s):
    751-756

    Deep learning is playing an increasingly important role in signal processing field due to its excellent performance on many inference problems. Parametric bilinear generalized approximate message passing (P-BiG-AMP) is a new approximate message passing based approach to a general class of structure-matrix bilinear estimation problems. In this letter, we propose a novel feed-forward neural network architecture to realize P-BiG-AMP methodology with deep learning for the inference problem of compressive sensing under matrix uncertainty. Linear transforms utilized in the recovery process and parameters involved in the input and output channels of measurement are jointly learned from training data. Simulation results show that the trained P-BiG-AMP network can achieve higher reconstruction performance than the P-BiG-AMP algorithm with parameters tuned via the expectation-maximization method.

  • Backbone Alignment and Cascade Tiny Object Detecting Techniques for Dolphin Detection and Classification

    Yih-Cherng LEE  Hung-Wei HSU  Jian-Jiun DING  Wen HOU  Lien-Shiang CHOU  Ronald Y. CHANG  

     
    PAPER-Image

      Pubricized:
    2020/09/29
      Vol:
    E104-A No:4
      Page(s):
    734-743

    Automatic tracking and classification are essential for studying the behaviors of wild animals. Owing to dynamic far-shooting photos, the occlusion problem, protective coloration, the background noise is irregular interference for designing a computerized algorithm for reducing human labeling resources. Moreover, wild dolphin images are hard-acquired by on-the-spot investigations, which takes a lot of waiting time and hardly sets the fixed camera to automatic monitoring dolphins on the ocean in several days. It is challenging tasks to detect well and classify a dolphin from polluted photos by a single famous deep learning method in a small dataset. Therefore, in this study, we propose a generic Cascade Small Object Detection (CSOD) algorithm for dolphin detection to handle small object problems and develop visualization to backbone based classification (V2BC) for removing noise, highlighting features of dolphin and classifying the name of dolphin. The architecture of CSOD consists of the P-net and the F-net. The P-net uses the crude Yolov3 detector to be a core network to predict all the regions of interest (ROIs) at lower resolution images. Then, the F-net, which is more robust, is applied to capture the ROIs from high-resolution photos to solve single detector problems. Moreover, a visualization to backbone based classification (V2BC) method focuses on extracting significant regions of occluded dolphin and design significant post-processing by referencing the backbone of dolphins to facilitate for classification. Compared to the state of the art methods, including faster-rcnn, yolov3 detection and Alexnet, the Vgg, and the Resnet classification. All experiments show that the proposed algorithm based on CSOD and V2BC has an excellent performance in dolphin detection and classification. Consequently, compared to the related works of classification, the accuracy of the proposed designation is over 14% higher. Moreover, our proposed CSOD detection system has 42% higher performance than that of the original Yolov3 architecture.

  • Robustness of Deep Learning Models in Dermatological Evaluation: A Critical Assessment

    Sourav MISHRA  Subhajit CHAUDHURY  Hideaki IMAIZUMI  Toshihiko YAMASAKI  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2020/12/22
      Vol:
    E104-D No:3
      Page(s):
    419-429

    Our paper attempts to critically assess the robustness of deep learning methods in dermatological evaluation. Although deep learning is being increasingly sought as a means to improve dermatological diagnostics, the performance of models and methods have been rarely investigated beyond studies done under ideal settings. We aim to look beyond results obtained on curated and ideal data corpus, by investigating resilience and performance on user-submitted data. Assessing via few imitated conditions, we have found the overall accuracy to drop and individual predictions change significantly in many cases despite of robust training.

  • Benchmarking Modern Edge Devices for AI Applications

    Pilsung KANG  Jongmin JO  

     
    PAPER-Computer System

      Pubricized:
    2020/12/08
      Vol:
    E104-D No:3
      Page(s):
    394-403

    AI (artificial intelligence) has grown at an overwhelming speed for the last decade, to the extent that it has become one of the mainstream tools that drive the advancements in science and technology. Meanwhile, the paradigm of edge computing has emerged as one of the foremost areas in which applications using the AI technology are being most actively researched, due to its potential benefits and impact on today's widespread networked computing environments. In this paper, we evaluate two major entry-level offerings in the state-of-the-art edge device technology, which highlight increased computing power and specialized hardware support for AI applications. We perform a set of deep learning benchmarks on the devices to measure their performance. By comparing the performance with other GPU (graphics processing unit) accelerated systems in different platforms, we assess the computational capability of the modern edge devices featuring a significant amount of hardware parallelism.

  • Multi-Category Image Super-Resolution with Convolutional Neural Network and Multi-Task Learning

    Kazuya URAZOE  Nobutaka KUROKI  Yu KATO  Shinya OHTANI  Tetsuya HIROSE  Masahiro NUMA  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2020/10/02
      Vol:
    E104-D No:1
      Page(s):
    183-193

    This paper presents an image super-resolution technique using a convolutional neural network (CNN) and multi-task learning for multiple image categories. The image categories include natural, manga, and text images. Their features differ from each other. However, several CNNs for super-resolution are trained with a single category. If the input image category is different from that of the training images, the performance of super-resolution is degraded. There are two possible solutions to manage multi-categories with conventional CNNs. The first involves the preparation of the CNNs for every category. This solution, however, requires a category classifier to select an appropriate CNN. The second is to learn all categories with a single CNN. In this solution, the CNN cannot optimize its internal behavior for each category. Therefore, this paper presents a super-resolution CNN architecture for multiple image categories. The proposed CNN has two parallel outputs for a high-resolution image and a category label. The main CNN for the high-resolution image is a normal three convolutional layer-architecture, and the sub neural network for the category label is branched out from its middle layer and consists of two fully-connected layers. This architecture can simultaneously learn the high-resolution image and its category using multi-task learning. The category information is used for optimizing the super-resolution. In an applied setting, the proposed CNN can automatically estimate the input image category and change the internal behavior. Experimental results of 2× image magnification have shown that the average peak signal-to-noise ratio for the proposed method is approximately 0.22 dB higher than that for the conventional super-resolution with no difference in processing time and parameters. We have ensured that the proposed method is useful when the input image category is varying.

81-100hit(167hit)

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.